122 research outputs found

    Extreme Scale De Novo Metagenome Assembly

    Full text link
    Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.Comment: Accepted to SC1

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    ESTimating plant phylogeny: lessons from partitioning

    Get PDF
    BACKGROUND: While Expressed Sequence Tags (ESTs) have proven a viable and efficient way to sample genomes, particularly those for which whole-genome sequencing is impractical, phylogenetic analysis using ESTs remains difficult. Sequencing errors and orthology determination are the major problems when using ESTs as a source of characters for systematics. Here we develop methods to incorporate EST sequence information in a simultaneous analysis framework to address controversial phylogenetic questions regarding the relationships among the major groups of seed plants. We use an automated, phylogenetically derived approach to orthology determination called OrthologID generate a phylogeny based on 43 process partitions, many of which are derived from ESTs, and examine several measures of support to assess the utility of EST data for phylogenies. RESULTS: A maximum parsimony (MP) analysis resulted in a single tree with relatively high support at all nodes in the tree despite rampant conflict among trees generated from the separate analysis of individual partitions. In a comparison of broader-scale groupings based on cellular compartment (ie: chloroplast, mitochondrial or nuclear) or function, only the nuclear partition tree (based largely on EST data) was found to be topologically identical to the tree based on the simultaneous analysis of all data. Despite topological conflict among the broader-scale groupings examined, only the tree based on morphological data showed statistically significant differences. CONCLUSION: Based on the amount of character support contributed by EST data which make up a majority of the nuclear data set, and the lack of conflict of the nuclear data set with the simultaneous analysis tree, we conclude that the inclusion of EST data does provide a viable and efficient approach to address phylogenetic questions within a parsimony framework on a genomic scale, if problems of orthology determination and potential sequencing errors can be overcome. In addition, approaches that examine conflict and support in a simultaneous analysis framework allow for a more precise understanding of the evolutionary history of individual process partitions and may be a novel way to understand functional aspects of different kinds of cellular classes of gene products

    Automated simultaneous analysis phylogenetics (ASAP) : an enabling tool for phlyogenomics

    Get PDF
    © 2008 Sarkar et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 2.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The definitive version was published in BMC Bioinformatics 9 (2008): 103, doi:10.1186/1471-2105-9-103.The availability of sequences from whole genomes to reconstruct the tree of life has the potential to enable the development of phylogenomic hypotheses in ways that have not been before possible. A significant bottleneck in the analysis of genomic-scale views of the tree of life is the time required for manual curation of genomic data into multi-gene phylogenetic matrices. To keep pace with the exponentially growing volume of molecular data in the genomic era, we have developed an automated technique, ASAP (Automated Simultaneous Analysis Phylogenetics), to assemble these multigene/multi species matrices and to evaluate the significance of individual genes within the context of a given phylogenetic hypothesis. Applications of ASAP may enable scientists to re-evaluate species relationships and to develop new phylogenomic hypotheses based on genome-scale data.This work is funded in part by NSF DBI-0421604 to GC and RD. INS is supported in part by the Ellison Medical Foundation

    Sources and resources: importance of nutrients, resource allocation, and ecology in microalgal cultivation for lipid accumulation

    Get PDF
    Regardless of current market conditions and availability of conventional petroleum sources, alternatives are needed to circumvent future economic and environmental impacts from continued exploration and harvesting of conventional hydrocarbons. Diatoms and green algae (microalgae) are eukaryotic photoautotrophs that can utilize inorganic carbon (e.g., CO2) as a carbon source and sunlight as an energy source, and many microalgae can store carbon and energy in the form of neutral lipids. In addition to accumulating useful precursors for biofuels and chemical feed stocks, the use of autotrophic microorganisms can further contribute to reduced CO2 emissions through utilization of atmospheric CO2. Because of the inherent connection between carbon, nitrogen, and phosphorus in biological systems, macronutrient deprivation has been proven to significantly enhance lipid accumulation in different diatom and algae species. However, much work is needed to understand the link between carbon, nitrogen, and phosphorus in controlling resource allocation at different levels of biological resolution (cellular versus ecological). An improved understanding of the relationship between the effects of N, P, and micronutrient availability on carbon resource allocation (cell growth versus lipid storage) in microalgae is needed in conjunction with life cycle analysis. This mini-review will briefly discuss the current literature on the use of nutrient deprivation and other conditions to control and optimize microalgal growth in the context of cell and lipid accumulation for scale-up processes

    Quantitative cross-species extrapolation between humans and fish: The case of the anti-depressant fluoxetine

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund.Fish are an important model for the pharmacological and toxicological characterization of human pharmaceuticals in drug discovery, drug safety assessment and environmental toxicology. However, do fish respond to pharmaceuticals as humans do? To address this question, we provide a novel quantitative cross-species extrapolation approach (qCSE) based on the hypothesis that similar plasma concentrations of pharmaceuticals cause comparable target-mediated effects in both humans and fish at similar level of biological organization (Read-Across Hypothesis). To validate this hypothesis, the behavioural effects of the anti-depressant drug fluoxetine on the fish model fathead minnow (Pimephales promelas) were used as test case. Fish were exposed for 28 days to a range of measured water concentrations of fluoxetine (0.1, 1.0, 8.0, 16, 32, 64 μg/L) to produce plasma concentrations below, equal and above the range of Human Therapeutic Plasma Concentrations (HTPCs). Fluoxetine and its metabolite, norfluoxetine, were quantified in the plasma of individual fish and linked to behavioural anxiety-related endpoints. The minimum drug plasma concentrations that elicited anxiolytic responses in fish were above the upper value of the HTPC range, whereas no effects were observed at plasma concentrations below the HTPCs. In vivo metabolism of fluoxetine in humans and fish was similar, and displayed bi-phasic concentration-dependent kinetics driven by the auto-inhibitory dynamics and saturation of the enzymes that convert fluoxetine into norfluoxetine. The sensitivity of fish to fluoxetine was not so dissimilar from that of patients affected by general anxiety disorders. These results represent the first direct evidence of measured internal dose response effect of a pharmaceutical in fish, hence validating the Read-Across hypothesis applied to fluoxetine. Overall, this study demonstrates that the qCSE approach, anchored to internal drug concentrations, is a powerful tool to guide the assessment of the sensitivity of fish to pharmaceuticals, and strengthens the translational power of the cross-species extrapolation

    Pentalogy of Cantrell: two patients and a review to determine prognostic factors for optimal approach

    Get PDF
    Two patients with incomplete pentalogy of Cantrell are described. The first was a girl with a large omphalocele with evisceration of the heart, liver and intestines with an intact sternum. Echocardiography showed profound intracardiac defects. The girl died 33 h after birth. The second patient was a female fetus with ectopia cordis (EC) without intracardiac anomalies; a large omphalocele with evisceration of the heart, stomach, spleen and liver; a hypoplastic sternum and rib cage; and a scoliosis. The pregnancy was terminated. A review of patients described in the literature is presented with the intention of finding prognostic factors for an optimal approach to patients with the pentalogy of Cantrell. In conclusion the prognosis seems to be poorer in patients with the complete form of pentalogy of Cantrell, EC, and patients with associated anomalies. Intracardial defects do not seem to be a prognostic factor

    Modelling the Spread of HIV Immune Escape Mutants in a Vaccinated Population

    Get PDF
    Because cytotoxic T-lymphocytes (CTLs) have been shown to play a role in controlling human immunodeficiency virus (HIV) infection and because CTL-based simian immunodeficiency virus (SIV) vaccines have proved effective in non-human primates, one goal of HIV vaccine design is to elicit effective CTL responses in humans. Such a vaccine could improve viral control in patients who later become infected, thereby reducing onwards transmission and enhancing life expectancy in the absence of treatment. The ability of HIV to evolve mutations that evade CTLs and the ability of these ‘escape mutants’ to spread amongst the population poses a challenge to the development of an effective and robust vaccine. We present a mathematical model of within-host evolution and between-host transmission of CTL escape mutants amongst a population receiving a vaccine that elicits CTL responses to multiple epitopes. Within-host evolution at each epitope is represented by the outgrowth of escape mutants in hosts who restrict the epitope and their reversion in hosts who do not restrict the epitope. We use this model to investigate how the evolution and spread of escape mutants could affect the impact of a vaccine. We show that in the absence of escape, such a vaccine could markedly reduce the prevalence of both infection and disease in the population. However the impact of such a vaccine could be significantly abated by CTL escape mutants, especially if their selection in hosts who restrict the epitope is rapid and their reversion in hosts who do not restrict the epitope is slow. We also use the model to address whether a vaccine should span a broad or narrow range of CTL epitopes and target epitopes restricted by rare or common HLA types. We discuss the implications and limitations of our findings
    • …
    corecore